S-Index: a Hybrid Structure for Text Retrieval
نویسندگان
چکیده
Today, two classes of indexing methods enjoying wide applicability are the Inverted Index and the Superimposed Coding based Signature File (SC-SF). The former is most efficient in query processing but utilizes extra storage of size comparable to that of the textbase, whereas the latter is most efficient in storage utilization. The present study builds upon the results obtained in previous research [2], and proposes a hybrid structure for text retrieval. The new structure is labelled S-Index and is shown to be of a tunable performance which ranges between two extreme ends. At the one extreme end, S-Index turns into a Signature File, which involves zero information loss and, in this respect, it is faster than the ordinary SC-SF method. At the other extreme end, S-Index becomes an Inverted Index. The advantage of the proposed access method is that frequently queried sections of text are indexed via an Inverted Index, whereas the bulk of the textbase, which is not frequently targeted by user queries, is stored in the form of a Signature File.
منابع مشابه
A Hybrid Approach to Index Maintenance in Dynamic Text Retrieval Systems
In-place and merge-based index maintenance are the two main competing strategies for on-line index construction in dynamic information retrieval systems based on inverted lists. Motivated by recent results for both strategies, we investigate possible combinations of in-place and merge-based index maintenance. We present a hybrid approach in which long posting lists are updated in-place, while s...
متن کاملStructural summaries as a core technology for efficient XML retrieval
The Extensible Markup Language (XML) is extremely popular as a generic markup language for text documents with an explicit hierarchical structure. The different types of XML data found in today’s document repositories, digital libraries, intranets and on the web range from flat text with little meaningful structure to be queried, over truly semistructured data with a rich and often irregular st...
متن کاملLightweight Integration of IR & DB for Scalable Hybrid Search with Integrated Ranking Support
The Web contains a large amount of documents and an increasing quantity of structured data in the form of RDF triples. Many of these triples are annotations associated with documents. While structured queries constitute the principal means to retrieve structured data, keyword queries are typically used for document retrieval. Clearly, a form of hybrid search that seamlessly integrates these for...
متن کاملImage retrieval using the combination of text-based and content-based algorithms
Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...
متن کاملRetrieving Documents with Geographic References Using a Spatial Index Structure Based on Ontologies
Both Geographic Information Systems and Information Retrieval have been very active research fields in the last decades. Lately, a new research field called Geographic Information Retrieval has appeared from the intersection of these two fields. The main goal of this field is to define index structures and techniques to efficiently store and retrieve documents using both the text and the geogra...
متن کامل